Part 1

The elbow in the given figure can be 2 or 5. Since the elbow at 2 is slightly sharper, we use 2. This could be the distinction between hatchback and non-hatchback types, since the body type and weight significantly differs between hatchback and other car types.

We see that in hierarchical clustering the longest line is in the last merge. Thus, we draw the line cutting it around 30 and the number of clusters is 2.

Optimal clusters for each of the methods have been mentioned after the method.

K means clustering uses a centroid based clustering method where number of centroids have to be given to the function beforehand. Thus to get optimal clusters be need to use elbow method. Hierarchical clustering uses a dendrogram to obtain as many clusters as specified. The method of selecting optimal clusters is cutting the dendrogram at the longest line. Thus both these methods are vastly different.

the optimal clusters are thus 2 in bot K means and Hierarchical clustering methods.

Conclusion, part 1

In the above analysis we built K means clustering and hierarchical clustering models on the data.
To provide a better analysis in the future:

  1. Data points about more cars could be collected
  2. Include more features, like

Part 2

the model knn will help in imputing the missing values of the data, owing to 100% accuracy in prediction

Part 3

Conclusion

Thus we reduced dimensionality of the dataset to about half(from 18 to 9), without significantly affecting accuracy(2%).

Part 4

Part 5

6 of the dimensionality reduction techniques implementable in python are as follows:

  1. PCA
  2. Singular Value Decomposition
  3. Linear Discriminant Analysis
  4. Isomap Embedding
  5. Locally Linear Embedding
  6. Modified Locally Linear Embedding